# Visual Document Retrieval

Omniembed V0.1
MIT
A multimodal embedding model based on Qwen2.5-Omni-7B, supporting unified embedding representations for cross-lingual text, images, audio, and video
Multimodal Fusion
O
Tevatron
2,190
3
Biqwen2 V0.1
Apache-2.0
BiQwen2 is a visual retrieval model based on Qwen2-VL-2B-Instruct and the ColBERT strategy, focusing on efficient visual document retrieval.
Text-to-Image Safetensors English
B
vidore
460
0
Nomic Embed Multimodal 7b
Apache-2.0
A 7-billion-parameter multimodal embedding model specialized in visual document retrieval tasks, achieving outstanding performance on the Vidore-v2 benchmark
Text-to-Image Supports Multiple Languages
N
nomic-ai
741
26
Nomic Embed Multimodal 3b
Nomic Embed Multimodal 3B is a cutting-edge multimodal embedding model focused on visual document retrieval tasks, supporting unified text-image encoding, achieving an outstanding performance of 58.8 NDCG@5 in the Vidore-v2 test.
Text-to-Image Supports Multiple Languages
N
nomic-ai
3,431
11
Colnomic Embed Multimodal 3b
ColNomic Embed Multimodal 3B is a 3-billion-parameter multimodal embedding model specifically designed for visual document retrieval tasks, supporting unified encoding of multilingual text and images.
Multimodal Fusion Supports Multiple Languages
C
nomic-ai
4,636
17
Colsmol 500M
MIT
A visual retrieval model based on SmolVLM-Instruct-500M and the ColBERT strategy, capable of efficiently indexing documents through visual features
Text-to-Image Safetensors English
C
vidore
1,807
17
Colqwen2 V1.0
Apache-2.0
ColQwen2 is a visual retrieval model based on Qwen2-VL-2B-Instruct and the ColBERT strategy, designed for efficient indexing of document visual features.
Text-to-Image Safetensors English
C
vidore
106.85k
86
Dse Qwen2 2b Mrl V1
Apache-2.0
DSE-QWen2-2b-MRL-V1 is a dual-encoder model specifically designed for encoding document screenshots into dense vectors to facilitate document retrieval.
Multimodal Fusion Supports Multiple Languages
D
MrLight
4,447
56
Colpali V1.2
MIT
ColPali is a vision-language model based on PaliGemma-3B and the ColBERT strategy, designed for efficient document indexing from visual features.
Text-to-Image English
C
vidore
61.77k
108
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase